AITopics | scale vector

Collaborating Authors

scale vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Wang, Mingze, Zhu, Shuchen, Fang, Yuxin, Li, Binghui, Shen, Kai, Zhong, Shu

arXiv.org Machine LearningMay-27-2026

Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure. First, we show empirically that although scale vectors constitute only a negligible fraction of model parameters, removing them substantially degrades LLM pre-training. Our theory further shows that, in Pre-Norm architectures, scale vectors do not increase expressivity; instead, they improve optimization through a self-amplifying preconditioning effect on subsequent linear mappings. Second, we investigate the role of weight decay for scale vectors. By distinguishing Input-Norm and Output-Norm layers, we theoretically show that weight decay is beneficial for the former but harmful for the latter, due to their distinct roles in optimization and expressivity. Third, motivated by this understanding, we propose three lightweight and complementary improvements to scale vectors: branch-specific heterogeneity, improved placement around linear mappings, and magnitude-direction reparameterization. Both theory and experiments show that each improvement yields consistent gains. Finally, we combine these improvements into a unified scale-vector strategy and evaluate it through extensive LLM pre-training experiments on dense and mixture-of-experts models ranging from 0.12B to 2B parameters, across multiple optimizers and learning rate schedules, under industrial-scale token budgets. The unified strategy consistently achieves lower terminal loss than well-tuned baselines and exhibits more favorable scaling behavior, while adding negligible parameter and computational overhead.

large language model, natural language, scale vector, (16 more...)

arXiv.org Machine Learning

2605.26895

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

NeuSpin: Design of a Reliable Edge Neuromorphic System Based on Spintronics for Green AI

Ahmed, Soyed Tuhin, Danouchi, Kamal, Prenat, Guillaume, Anghel, Lorena, Tahoori, Mehdi B.

arXiv.org Artificial IntelligenceJan-11-2024

Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks (BayNNs) are resource-intensive and face challenges with traditional computing architectures due to the memory wall problem. Computing-in-Memory (CIM) with emerging resistive memories offers a solution by combining memory blocks and computing units for higher efficiency and lower power consumption. However, implementing BayNNs on CIM hardware, particularly with spintronic technologies, presents technical challenges due to variability and manufacturing defects. The NeuSPIN project aims to address these challenges through full-stack hardware and software co-design, developing novel algorithmic and circuit design approaches to enhance the performance, energy-efficiency and robustness of BayNNs on sprintronic-based CIM platforms.

architecture, baynn, implementation, (13 more...)

arXiv.org Artificial Intelligence

2401.06195

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)

Genre: Research Report (0.82)

Industry: Information Technology (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale

Ahmed, Soyed Tuhin, Danouchi, Kamal, Hefenbrock, Michael, Prenat, Guillaume, Anghel, Lorena, Tahoori, Mehdi B.

arXiv.org Artificial IntelligenceJan-11-2024

Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. Bayesian Neural Networks (BayNNs) with Dropout as an approximation offer a systematic approach to quantifying uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and computation. Thus, the applicability of BayNNs to edge devices with limited resources or to high-performance applications is challenging. Some of the inherent costs of BayNNs can be reduced by accelerating them in hardware on a Computation-In-Memory (CIM) architecture with spintronic memories and binarizing their parameters. However, numerous stochastic units are required to implement conventional dropout-based BayNN. In this paper, we propose the Scale Dropout, a novel regularization technique for Binary Neural Networks (BNNs), and Monte Carlo-Scale Dropout (MC-Scale Dropout)-based BayNNs for efficient uncertainty estimation. Our approach requires only one stochastic unit for the entire model, irrespective of the model size, leading to a highly scalable Bayesian NN. Furthermore, we introduce a novel Spintronic memory-based CIM architecture for the proposed BayNN that achieves more than $100\times$ energy savings compared to the state-of-the-art. We validated our method to show up to a $1\%$ improvement in predictive performance and superior uncertainty estimates compared to related works.

architecture, dropout, neural network, (15 more...)

arXiv.org Artificial Intelligence

2311.15816

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
(9 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (0.67)

Add feedback

Structured Ensembles: an Approach to Reduce the Memory Footprint of Ensemble Methods

Pomponi, Jary, Scardapane, Simone, Uncini, Aurelio

arXiv.org Artificial IntelligenceMay-6-2021

In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory.

ensemble, scale vector, structured ensemble, (12 more...)

arXiv.org Artificial Intelligence

2105.02551

Country: Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback